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Abstract — We address the maximum attainable rate of finger- 
printing codes under the marking assumption, studying lower 
and upper bounds on the value of the rate for various sizes of 
the attacker coalition. Lower bounds are obtained by considering 
typical coalitions, which represents a new idea in the area of 
fingerprinting and enables us to improve the previously known 
lower bounds for coalitions of size two and three. For upper 
bounds, the fingerprinting problem is modelled as a communica- 
tions problem. It is shown that the maximum code rate is bounded 
above by the capacity of a certain class of channels, which are 
similar to the multiple-access channel. Converse coding theorems 
proved in the paper provide new upper bounds on fingerprinting 
capacity. 

It is proved that capacity for fingerprinting against coalitions 
of size two and three over the binary alphabet satisfies 0.25 < 
C 2 , 2 < 0.322 and 0.083 < C 3 , 2 < 0.199 respectively. For 
coalitions of an arbitrary fixed size t, we derive an upper 
bound (tln2) _1 on fingerprinting capacity in the binary case. 
Finally, for general alphabets, we establish upper bounds on 
the fingerprinting capacity involving only single-letter mutual 
information quantities. 

Index Terms — Digital fingerprinting, channel capacity, 
multiple-access channel, strong converse theorem. 

I. Introduction 

THE distribution of licensed digital content (e.g., software, 
movies, music etc.) has become increasingly popular in 
recent times. With this comes the need to protect the copyright 
of the distributor against unauthorized redistribution of the 
content {piracy). 

To introduce the problem, we begin with an informal 
description. Suppose the distributor has some content which 
he would like to distribute among a set of licensed users. 
One can think of a simple scheme where each licensed 
copy is identified by a unique mark (fingerprint) which is 
embedded in the content and is imperceptible to the users of 
the system. Note that the distributed copies are identical except 
for the fingerprints. If a naive user distributes a copy of his 
fingerprinted content, then the pirated copy can easily be traced 
back to the guilty user and hence he will be exposed. Tracing 
the guilty user becomes more difficult when a collection of 
users (pirates) form a coalition to detect the fingerprints and 
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modify/erase them before illegally distributing the data. Digital 
fingerprinting is a technique that assigns to each user a mark in 
a way that enables the distributor to identify at least one of the 
members of the coalition as long as its size does not exceed 
a certain threshold t, which is a parameter of the problem. 

There are two main setups considered for the fingerprinting 
problem in the literature. The distortion setting is commonly 
used in applications relating to multimedia fingerprinting [14], 
[15]. In this model, the fingerprint is usually a "covert signal" 
which is superimposed on the original "host" data in such 
a way that the difference, or distortion, between the original 
and the fingerprinted copies is smaller than some threshold. 
The coalitions are restricted to creating a forgery which has 
distortion less than some threshold from at least one of the 
colluders' fingerprinted copies. 

On the other hand, we have the marking assumption setting 
introduced in [10] which will be our main interest in this 
paper. In this case, the fingerprint is a set of redundant digits 
which are distributed in some random positions (unknown 
to the users) across the information digits of the original 
content. The fingerprint positions remain the same for all 
users. It is assumed that these redundant digits do not af- 
fect the functionality of the content, while tampering with 
an information digit damages the content permanently. The 
motivation for this assumption comes from applications to 
software fingerprinting, where modifying arbitrary digits can 
damage its functionality. 

The coalition attempts to discover some of the fingerprint 
positions by comparing their marked copies for differences. If 
they find a difference in some position, it is guaranteed to be a 
redundant fingerprint digit. In the other positions, it could be 
either an information digit or a fingerprint digit. The marking 
assumption states that the coalitions may modify only those 
positions where they find a difference in their fingerprinted 
copies. Hence, in analyzing this model, it becomes sufficient to 
just look at the fingerprint positions and ignore the information 
digits. The collection of fingerprints distributed to all the users 
of the system together with the strategy of decoding (pirate 
identification) used is called a code below. A code is said to 
be t-fingerprinting or collusion-secure against coalitions of t 
pirates if the error probability of decoding approaches as the 
code length tends to oo. 

Collusion-secure fingerprinting codes were introduced by 
Boneh and Shaw [10]. It was shown in [10] that for any single 
deterministic code, the probability of decoding error in the 
"wide-sense" formulation (see Section |ll]l is bounded away 
from zero. Hence, it becomes necessary for the distributor 
to use some form of randomization, where the random key 
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is known only to the distributor, in order to construct such 
fingerprinting codes. This paper also gave the first example 
of codes with vanishing error probability. Further general 
constructions were proposed by Barg et al. [6] and Tardos 
[17]. 

The case of zero error probability was considered indepen- 
dently by Hollmann et al. [12] who termed them as codes 
with the identifiable parent property, or IPP codes. They were 
further studied in [7], [3], [8], [16] among others. 

In this paper, we are interested in computing the funda- 
mental limits of the fingerprinting problem, i.e., in estab- 
lishing bounds on the capacity (or maximum attainable rate) 
of fingerprinting codes. We denote by Ct, q the capacity of 
fingerprinting with g-ary codes against coalitions of size t (this 
quantity is defined formally later in the paper). The problem 
of determining the fingerprinting capacity was raised in [6]. To 
date, only some lower bounds are known through constructions 
and existence results: C 2 . 2 > 0.2075 [9]; C 3 , 2 > 0.064 [4]; 
C t ,i > (100t 2 ln2)- 1 ,t > 2 [17]. 

New capacity bounds of our paper are based on an 
information-theoretic view of the fingerprinting problem. They 
are established as follows. Attainability results (lower bounds) 
are shown by random coding techniques which take into 
account the typical coalitions, i.e., the coalitions that oc- 
cur with high probability. This represents a new idea in 
fingerprinting which enables us to improve random choice 
arguments of various kinds used earlier in [4], [10], [9], [17]. 
For upper bounds we model fingerprinting as a multi-user 
communications channel. A converse theorem for a transmis- 
sion scenario that models some aspects of the fingerprinting 
problem is proved to establish an upper bound on the capacity 
of fingerprinting. 

It should be noted that a similar information-theoretic ap- 
proach to finding the capacity of fingerprinting was previously 
studied in [15] and [2]. In [15], the authors obtain upper and 
lower bounds on the capacity of fingerprinting with distortion 
constraints as opposed to the marking assumption setting of 
this paper. Paper [1] uses the marking assumption setting, but 
it addresses a simpler problem whose results do not directly 
apply to fingerprinting. 

The rest of the paper is organized as follows. In Section HH 
we recall the statement of the fingerprinting problem and give 
an information-theoretic formulation. We also prove several 
results related to the problem statement that justify various 
techniques used to derive bounds on the fingerprinting capacity 
later in the paper. In particular, lower bounds on Ct.2,t = 2, 3 
are proved in Section [HI] Sections [IV] and [V] are devoted to 
upper bounds on C f . 9 for arbitrary t, q and their specializations 
for t = 2, 3 in the case of the binary alphabet. 

II. Problem statement 

A. Notation 

Random variables (r.v.'s) will be denoted by capital letters 
and their realizations by low-case letters. The probability 
distribution of a r.v. X will be denoted by Px- If X and 
Y are independent r.v.'s, then their joint distribution is written 
as Px x Py ■ For positive integers I, m, A^ +m will denote the 



collection of r.v.'s {Xi, Xi+i, . . . ,Xi +m }, and the shorthand 
(7] will be used to denote the set {1,...,/}. Boldface will 
denote vectors of length n. For example, x denotes a vector 
(xi, . . . , x n ) and X denotes a random vector {X\, . . . , X n ). 
The Hamming distance between vectors x, y will be written 
as dist(a;,y). We will denote the binary entropy function by 
h(x) := — xlog 2 x—(l — x) log 2 (l — x) and l(-) will represent 
the indicator function. 

B. Fingerprinting codes 

Let Q denote an alphabet of (finite) size q. Let M be the 
number of users in the system and let n denote the length 
of the fingerprints. Assume that there is some ordering of the 
users and denote their set by Ai = {1, . . . , M}. Let JC be a 
finite set whose size may depend on n. Elements of the set 
JC will be called keys. For every k G JC, let (fk,<Pk) be an 
n-length code, i.e., a pair of encoding and decoding mappings: 

/* : M -» Q n (1) 

<fe : Q" M U {0} (2) 

where the decoder output will signify a decoding failure. By 
definition, the fingerprinting system is formed by a randomized 
code, i.e., a random variable (F, $) taking values in the family 
{(fk, 4>k), k G JC}. Note that the dependence on n has been 
suppressed in this notation for simplicity. The rate of this code 
is R = n~ l \og q M. 

The system operates as follows. The distributor chooses 
a key k according to a probability distribution 7r(fc) on JC 
and assigns the fingerprint fk(i) to user i. On receiving a 
forged fingerprint, the distributor uses the tracing strategy <fik 
(corresponding to the selected key) to determine one of the 
guilty users. 

We need randomization because: (a) deterministic finger- 
printing codes do not exist in certain formulations [10], [6], 
and (b) we allow the family of encoders and decoders and 
the distribution ir(k) to be known to all users of the system. 
The only advantage the distributor has is the knowledge 
of the particular key being used. This assumption follows 
the accepted standards of cryptographic systems where it is 
usually assumed that the encryption/decryption algorithms are 
publicly available and that the only parameter kept secret by 
the system's constructor is the key. 

The fingerprints are assumed to be distributed inside the 
host message so that its location is unknown to the users. The 
location of the fingerprints, however, remains the same for 
all users. A coalition U of t users is an arbitrary ^-subset of 
{1, . . . ,M}. Following accepted usage, we will refer to the 
members of the coalition as "pirates". The coalition observes 
the collection of their fingerprints fk(U) = {xi, . . . , x t } and 
attempts to create a fingerprint y G Q" that does not enable 
the distributor to trace it back to any of the users in U. Note 
that although the fingerprint locations are not available to the 
pirates, they may attempt to detect some of these locations by 
comparing their copies for differences. Thus, coordinate i of 
the fingerprints is called undetectable for the coalition U if 

■Eli ' •^ti 
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and is called detectable otherwise. 

Definition 2.1: The marking assumption states that for any 
fingerprint y created by the coalition U, yi — Xu = x-n 
■ ■ ■ x ti in every coordinate i that is undetectable. 

In other words, in creating y, the pirates can modify only 
detectable positions. 

For a given set of observed fingerprints {xi, . . . , x t }, the 
set of forgeries that can be created by the coalition is called the 
envelope. Its definition depends on the exact rule the coalition 
should follow to modify the detectable positions: 

• If the coalition is restricted to use only a symbol from 
their assigned fingerprints in the detectable positions, we 
obtain the narrow-sense envelope: 

£n{xi, ■ ■ ■ ,x t ) = {y e Q n \y l e {x u , . . .,x u },Vi}; 

(3) 

• If the coalition can use any symbol from the alphabet 
in the detectable positions, we obtain the wide-sense 
envelope: 

£ w (xi, ...,x t ) = {y€ Q n \yi = xu,Vi undetectable}. 

(4) 

We remark that there are further generalizations of the rules 
above where coalitions are also allowed to erase the symbols 
in detectable positions. This generalization is not considered 
below; we refer the interested reader to [6]. In the following, 
we will use £(■) to denote the envelope from any of the rules 
or their generalizations mentioned above. 

Remark 2.2: The definition for a fingerprinting code de- 
pends on the envelope considered. Therefore, different prob- 
lems can arise for each definition of the envelope. The binary 
alphabet is of special interest because of its wide use in 
practical digital applications. For this special case, it is easy to 
see that the narrow-sense and wide-sense envelopes are exactly 
the same. 

Suppose that the coalition U uses a randomized strat- 
egy V(- 1 •,...,•) to create the new fingerprint, where 
V(y\xi, . . . , Xt) gives the probability that the coalition creates 
y given that it observes the fingerprints Xi, . . . ,x t . A strategy 
V is called admissible if 

V(y\x 1 , ...,x t )>0 only if y G £{x 1 , . . .,x t ). 

Let Vt denote the class of admissible strategies. Such ran- 
domized strategies model any general attack the coalition 
is capable of and also facilitate mathematical analysis. The 
distributor, on observing the suspect fingerprint y, uses the 
decoder <f>k while using the key k. Then the probability of 
error for a given coalition U and strategy V averaged over the 
family of codes is defined as follows: 

e(U,F,*,V) = E K ]T V(y\f K (U)) (5) 

4>K V y)tU 

where Ek is the expectation with respect to the distribution 

ir(k). 

Definition 2.3: A randomized code (F, $) is said to be t- 
fingerprinting with e-error if 

max max e(U, F, $, V) < e, Vr < t. (6) 

VtV-r U:\U\=T 



C. Fingerprinting capacity 

We now formulate the fingerprinting problem as a communi- 
cations problem in which the set of messages is identified with 
the set of users of the fingerprinting system. Each message is 
mapped to a codeword which corresponds to the fingerprint of 
the user. Any set of t messages (a coalition) may be chosen, 
and they are transmitted over an unknown t-input-single-output 
channel defined by the strategy of the coalition. The class of 
possible channels will be defined by the marking assumption. 
The output of the channel (that represents the strategy) gives 
the forged fingerprint. The task of the decoder is to recover 
at least one of the transmitted messages to have produced the 
channel output. 

Observe that this information-theoretic model differs from 
the traditional t-user Multiple-Access Channel (MAC) be- 
cause: (a) the decoder makes an error only when its output 
does not match any of the transmitted messages, and (b) all 
channel inputs are required to use the same codebook. 

For a given i-user strategy V, the maximum probability of 
error is given by 

e ma *(F,$,V)= max e({ui, . . . , u t }, F, *, V). (7) 
ui,...,u t eM 

Note that here the users u±, . . . ,u t are not necessarily distinct. 
It is straightforward to see that the i-fingerprinting condition 
(O can now be expressed as 

e max (F, $, V) < e for every VeV t . (8) 

Definition 2.4: For < e < 1, a number R > is an e- 
achievable rate for q-ary t-fingerprinting if for every 5 > 
and every n sufficiently large, there exists a randomized q-ary 
code (F, $) of length n with rate 

- log M > R — 6 

n H 

and maximum probability of error satisfying ([8]). 

The e-capacity of q-ary i-fingerprinting Ct. q {e) is the supre- 
mum of all such e-achievable rates. The capacity of g-ary t- 
fingerprinting is the infimum of the e-capacities for e > 0, 
i.e., 

C t ,q = lim C t ,q(e). 

To proceed with the capacity Ct jq , we wish to consider 
coalitions of size exactly t. First, given any i-user strategy 
V, define the maximum probability of error corresponding to 
coalitions of size t alone as 

( max (F,<f>,V)= max e(U,F,$,V). (9) 

U:\U\=t 

The capacity value C t .q corresponding to the above criterion 
can be similarly defined. 
Proposition 2.5: 

Ct,q = Ct,q- 

Clearly, Ct,q < Ct.q- The proof of the opposite inequality 
is also almost obvious because any coalition of t pirates can 
simply ignore any subset of t — r pirates when devising a 
forged fingerpint y. A formal version of this argument is 
provided by Lemma A. 1 in the Appendix. 
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Similarly to the above, let us consider the average error 
probability 

e avg (F,$,y) = ^ J2 e({ui,... >Ut },F,$,V). 

hi u t eM 

(10) 

(this quantity will be used in the derivation of upper bounds 
on the capacity Ct q ) and the probability 

e avg (F,$,V0 = -^- <U,F,$,V). (11) 

V t ) U:\U\=t 

for coalitions of size exactly t. Define the capacity (7 t a g with 
respect to the latter error probability. 

We make a remark on the relation between the average and 
maximum error criteria. In general, it is true that the average 
error criterion yields a higher capacity value compared to the 
maximum one. However, when randomization is allowed, it is 
well-known for single-user channels that the capacity value is 
the same for both the maximum and average error probability 
criteria (cf. e.g., [11, p. 223, Prob. 5]). We now extend this 
argument to the current context of multi-user channels and 
fingerprinting to show that both (O and (fTTT ) lead to the same 
capacity value. 

Proposition 2.6: 

t,q — °t,g- 

A formal proof is available in the Appendix. It follows 
because here we simply use a randomized code (F, <&), which 
also includes all Ml permutations of any specific realization of 
(F, $). Because of the symmetry introduced by this, the error 
probability e(U,F,$>,V) is the same for all coalitions for a 
given V, and hence the average and the maximum probability 
are the same. 

III. Lower bounds for binary ^-secure codes, 
t = 2,3 

In this section, we construct fingerprinting codes for t = 
2,3, with error probability decaying exponentially in n and 
with higher rate than previous constructions. The improvement 
is obtained by tailoring the decoder for the typical coalitions, 
i.e., the coalitions that occur with high probability. We will say 
that an event occurs with high probability if the probability that 
it fails is at most exp(— cn), where c is a positive constant. 

Our aim is to construct a sequence of randomized codes 
(F n , <!>„), n = 1,2,..., with error probability 

max e max (F n , $„, V) 

decaying to zero. By Proposition 12.51 it suffices to consider 
only coalitions of size exactly t. Suppose, for every n, there 
exists a set T t>n C (Q n )* such that for any coalition U of size 
t, the observed fingerprints fj(,n{U) belong to F t ,n with high 
probability. We will refer to a set with this property as a typical 
set. Thus, in constructing the required code it suffices to study 
the conditions that allow us to obtain vanishing probability 

Pr{^, n (y) i U\f K , n (U) = (x u ...,x t )} 



for any coalition U of size t, any typical t-tuple (cci, . . . , x t ) 
of observed fingerprints, and any forgery y G £(xi, . . . , x t ) as 
n — > oo . Our first result is a lower bound on the fingerprinting 
capacity with 2 pirates over the binary alphabet. 

Theorem 3.1: 

c 2 , 2 > l h- 

Proof: Fix Q = {0, 1}. Suppose that the encoding 
mapping F assigns M = 2 nR fingerprints to the users choos- 
ing them uniformly and independently from all 2" different 
vectors. For R < 1/2, the fingerprints will be distinct with 
high probability. 

Given a small e > 0, we define the typical set as the set of 
vector pairs which agree in I positions, where 

I e 4 = [n (i/a -e),n (1/2 + e)] . 

Consider any two pirates u\ and U2. Notice that their observed 
fingerprints form a typical pair (xi, X2) with high probability 
for an arbitrarily small e. Hence, (x\,X2) agree in I E I e 
positions. To create a forged fingerprint y the pirates must 
fill the remaining n — I positions. Let d% = dist(y, x%) and 
c?2 = dist(y, x-i). Then n — I G I e and therefore 

dt + di&Ie. (12) 

Given a forgery y, the decoder only considers typical pairs 
(x\,X2) from the codebook. Namely, the decoder takes any 
pair of distances (di,^) mat satisfy (TTZb and constructs the 
full lists S y (di) and S y (d2) of the fingerprints located at 
distances di and d2 from y. Each pair (xi,X2) G S y (di) x 
S y (d2) is then discarded if they simultaneously disagree with 
y in any position s, i.e., x\ s = X2» 7^ Us- All remaining 
pairs contain y in their envelope. For each such pair (x\, X2), 
the decoding is completed by choosing the pirate Ui whose 
fingerprint Xi has a smaller distance di — dist(y, Xi). Either 
user is chosen if d\ = e?2- 

Obviously, the fingerprints x\ and X2 that belong to the 
factual pirates will not be discarded by the above decoding 
algorithm. The following probabilistic analysis shows that 
for two innocent users, the decoder discards their observed 
fingerprints (z\,Z2) with high probability if the code rate 

R < 1/4. 

Indeed, for (zi, Z2) to be typical, they should agree in I G I e 
positions. In all these positions, z%, Z2 should also agree with 
y to fulfill the marking assumption. In each of the remaining 
n — I positions, the vectors z\, Z2 are represented by only two 
combinations, (01) or (10). The probability of choosing such 
a pair (z\, Z2) in our random code equals 

Pi = {^j2 n - l /2 2n 

and has exponential order of 2~"/ 2 for any I G I e . Further- 
more, by the union bound, the total probability of choosing 
such a pair in a random code of size M = 2 nR is at most 

(?) e (3^. 
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This probability tends to exponentially fast for any rate R < 
0.25. 

Similarly, consider a coalition (2:1,22) that includes the 
fingerprint x% of an actual pirate and the fingerprint 22 of an 
innocent user. Recall that X\ disagrees with y in d% positions. 
Then to be output instead of xi, the fingerprint 22 must agree 
with y in these positions and disagree with it in another set 
of d,2 < d\ positions. The total probability of choosing such 
a fingerprint 22 is at most 

M2 n ~ dl /2 n . 

Since d% + d 2 G I e and d 2 < d\, we have restriction 
d 2 < n /i (V 2 + e) ■ In this case, the above probability tends 
to exponentially fast for any rate R < 0.25. Thus, at least 
one pirate will be chosen from each coalition, and with high 
probability, no remaining (innocent) users will be chosen as 
pirates. ■ 

We note that considering typical coalitions enables us to 
improve the lower bound C2.2 > 0.2075 obtained in [9]. 

Next we establish a lower bound on the fingerprinting 
capacity with 3 pirates over the binary alphabet. 

Theorem 3.2: 

c 3 , 2 > yi2. 

Proof: Suppose again that the encoding mapping F 
assigns M = 2 nR fingerprints to the users choosing them 
uniformly and independently from all 2" different vectors. For 
a triple (x%, x%, X3), let 

C = {s G [n] : x u = x 2s = x 3s }, 

Aj = {s G [n] : x is = x js }, i,j = 1,2,3, i ^ j, 

and let I — \C\, kj — Given a small e > 0, we say that 

(xi, X2,x%) form a typical triple if 

le J £ = [n(i/4-e),n(V4 + e)], (13) 

I12, lis, G 4 = [n(i/a ~ e), + e)}. (14) 

For any three users w 1 ,u 2 ,u 3 , note that the observed finger- 
prints form a typical triple with high probability. 

Using the same idea as before, we now take the observed 
fingerprints (x\, X2, X3) to be a typical triple. A forged 
fingerprint y agrees with all the three fingerprints on C and 
takes arbitrarily values {0,1} on the remaining subset [n]\C 
positions. Let di = dist(y, xi) for i — 1,2, 3. Note that every 
position in [n] \C contributes 1 or 2 to the sum d\ + d2 + d$ 
implying 

„(a/ 4 -e) <d 1 +d 2 + d 3 < »(3/a + 2e). (15) 

Given a forged fingerprint y, the decoder considers only 
typical triples (xx,X2,X3) from the codebook. Each triple is 
then discarded if the fingerprints in it simultaneously disagree 
with y in any position s, i.e., x\ s — X2 S — x^s ^ y s . If a 
triple (xi, X2,X3) is left, decoding is completed by choosing 
the pirate whose fingerprint has the smallest distance to y 
among xi,x 2 ,x 3 . 

Obviously, the fingerprints (xi, x%, X3) corresponding to 
the factual pirates will not be discarded by the decoder. The 



following probabilistic analysis shows that a randomly chosen 
code of rate 

R < I/12 (16) 

enables the decoder to discard with high probability all typical 
triples (z\, z%, Z3) of fingerprints formed by three innocent 
users. Indeed, a typical triple can be identified only if the 
fingerprints in it simultaneously agree with y in some subset of 
I G J £ positions. To simplify our analysis in this case, we can 
even ignore the extra conditions (fl4l in any of the remaining 
n — l positions. Thus, we allow the vectors (z\, Z2, 23) to take 
on any combination of binary symbols {0, 1} different from 
all zeros or all ones. Given 6 such combinations, any typical 
triple is chosen with probability at most 

Pi < (f)6 n - l /2 3n . 

We further observe that the total probability of choosing such 
a triple in a random code of size M = 2 nR equals 

and tends to exponentially fast for any rate R < 1 /i2. 
Now consider a slightly more involved case when the decoder 
locates the pirate coalition (xi,X2,X3) along with another 
coalition (2:1,22,23) that includes the fingerprint x\ of an 
actual pirate and the fingerprints 22, 23 of two innocent users. 
In what follows, we prove that a random code of rate (TToT l 
satisfies at least one of the following two conditions: 

(i) The decoder chooses X\ in the coalition (x\, 22, 23) with 
high probability. 

(ii) The coalition (2:1,22,23) has vanishing probability. 

Recall that d\ = dist(y, x\). Then an innocent user, 22 
say, can be output by the decoder if dist(y, 22) < d%. The 
probability that any such 22 is chosen among M random 
codewords is obviously at most 




Given a code of rate (TToT l, this probability vanishes if d i/n < 
0.33. Therefore, condition (i) above fails if 

di/n > 0.33. (17) 

Now let us consider condition (ii) given this restriction. 
Consider a typical coalition (2:1,22,23). We have 

I = \{s G [n] : xu = z 2s = Z3 S = y s }\, 

I' = \{s G [n] : z 2s = z 3s ^ x ls }\ 

Thus, the vectors 22 , 23 have fixed values on the one subset 
of size I, where these vectors are equal to Xi, and on the other 
non-overlapping subset of size I', where the vectors 22, 23 are 
equal to the binary complement of x\. According to conditions 
03) and <HD, I G J e and 

I = I23 — I G J2e- 
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In the remaining n — I — I' positions we have 

(*2„* 3 .)e{(io)>(oi)}- 

Summarizing the above arguments, we conclude that the total 
probability of choosing such vectors Z2 , 23 in the random code 
is bounded above as 

'n — d\\ fn — Z N 
I \ I' 



2 1 ^ ^ 



l-l-l' 



Straightforward verification shows that this quantity vanishes 
given conditions (foi l. (fl4l) . and dTTb for a code of rate R < 
0.086. Thus a random code of smaller rate ( fl6b discards all 
mixed coalitions of the form (x\, Z2, £3) with high probabil- 
ity. 

The last remaining case, of a mixed coalition (x%, X2, Zz), 
is analyzed in a similar fashion (the analysis is simpler than 
the one above and will be omitted). ■ 

IV. A WEAK CONVERSE UPPER BOUND 

In finding upper bounds on fingerprinting capacity (here and 
also in Section [V), we restrict our attention to memoryless 
coalition strategies in order to make the problem tractable 
and to obtain single-letter expressions. Any upper bound on 
the capacity thus obtained will be also valid in the original 
problem. 

Let Wt denote the family of discrete memoryless channels 
(DMCs) W : Qx ■ ■ ■ x Q—f Q with t inputs that satisfy the 
marking assumption for a single letter, i.e., 

W t = {W : W(y\x, . . . , x) = if y £ x, Vx, y e Q}. (18) 

Note that the above definition corresponds to the wide-sense 
envelope £w(-) defined in (|4j. For the narrow-sense envelope 
£jv(0 © an d other variations of the problem it is possible to 
define similar communication channels and study their upper 
bounds on capacity. 

Observe that Wt is a convex and compact set. Let s 6 St, 
called the "state", be an index which identifies the particular 
W £ Wt- Hence, we will often write W(j/|xi, . . . , xs, s) for 
channels in Wt. 

We model a coalition's strategy by a (discrete memoryless) 
arbitrarily varying channel (AVC), i.e., the state of the channel 
can vary from symbol to symbol. For a given state sequence 
s € <S™, the channel is given by 

n 

W n (y\x 1 ,...,x t ;s) = l[W(y l \x m ...,xti;s l ). (19) 
1=1 

We denote the family of such channels W n (-\-, s) : 
Q n x • • • x Q n -> Q n , s e 5 t n by W t '\ 

Since the state sequence s completely identifies the channel, 
we will use e avg (F, $, s) to denote the error probability in ( TTOb . 

A. The general case 

Theorem 4.1 : Let (F, $) be a q-ary ^-fingerprinting code 
with e-error (0 < e < 1) of length n, rate R, and \K\ keys, 
such that eq nR > 2*. Then 

I! — ( max mm I(X 1 ,...,X t :Y\K)+Cn 



1 - 2te \PKx 1 ...x t weWt 



where f„ 

Py\x 1 ...x t 



tlog q 2/n, Xi,...,X t ,Y are q-ary r.v.'s, 
W, K is a r.v. taking values over a set 



of cardinality \]C\ and satisfying the Markov chain K <-> 
X\ , ■ . ■ , Xt *-*Y, and the maximization is over joint distribu- 
tions 



KXi...X t 



= PkxP: 



X X \K 



X • • • X P 



X t \K 



(20) 



with P Xl \K = ■■■ = P Xt \K- 
Proof: 

Let JC be a set of keys and let {(fk, (f>k), k £ JC\ be a 
family of codes with probability distribution 7r(fc) over K-. 
Since (F, $) is t-fingerprinting with e-error, it satisfies 



e avg (F, s) < e for every s e 5" 



(21) 



Let Ui, . . . , Ut be independent r.v.'s uniformly distributed over 
the message set {1, . . . , q nR } and let if be a r.v. independent 
of U\, ...,Ut, and with probability distribution 7r(fc) over /C. 
Also, let 



1,. 



(22) 



Fix some s G 5" and let Y" be such that Py\Xi,...,x t 
W n (-\-, s). Then, we have 



Pr(<f> K (Y)t{U 1 ,...,Ut})<e, 



(23) 



which follows from (fJTJ. We also have the following Markov 
chain 

W 



U 1 ,...,U t ,K~X 1 , 



Y. 



(24) 



Now, 



I(Ui 



, U t ; Y\K) = tni? - H(Ui, ...,U t \Y, K), (25) 



because Ui , . . . , Ut are independent and uniformly distributed 
over Ai. The second term in d25l l can be bounded above as 
follows. Define E. t = l(<j> K (Y) ^ U l ), i = 1, . . . , t. Let 
Pi - Pr{Ei = 0,Ej = l,j = l,...,t,j ^ i), i = l,...,t. 
Since 0^(y), E\, . . . ,Et are known given K,Y ,Ui, . . . ,U t} 

H(Ui,...,U t \Y,K) 

- H(U U ...,U U E X ,..., E t \Y, <I>k{Y), K) 



< t\og Q 2 + H(Ui, ...,U t \Y, 4 K (Y), K,E u ..., E t 



(26) 



< t log. 2 + etnR + 2 l q 



t„-nR 



tnR 



+ ^p i H{Ut\U i \U i , Y, K, Ek - 0, Ej =l,j^ i) (27) 
< t log q 2 + (e + 2 t q- nR )tnR + (t - l)nR. 



Equation $26t holds true because E\ , . . . , E t are binary r.v.'s 
and the term 2 t q~ nR tnR in (j27]> follows from the fact that 
there are at most 2* remaining terms and each can be bounded 
above by q~ nR tnR. Using this in (|2~5| >. we obtain 



nR(l-(e+2 t q- nR )t) < I(U 



,Us,Y\K)+t\og q 2. (28) 
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We now use the premise that eq nR > 2*, together with (1241 
and the memoryless property of the channel, which results in 

ir ^-L-(li(Ui,...,Us t Y\K) + z n 



-I(X 1 ,...,X t ;Y\K)+£ n 




1 - 2te n 



- Y^I{X lu ... 1 X tl] Y l \K)+t n 

in £ — i 



1=1 



Moreover, since the above bound applies for every s £ S", 
i.e., for every W n £ W", 

R < I ~ min £ . • • , X tl ; Yl \K) + 



1 — 2te \ n if"£W; 



i=i 



because the minimization is over channels whose state may 
vary over W t for every letter. Note that Xi, . . . ,X t are 
independent and identically distributed (i.i.d.) given K (by 
(1221). Therefore, given i^, for every I g [n], Xi;, . . . , X t i are 
i.i.d. Hence, 



i? < 



1 



max min I(X\ , . . . , X t ; Y\ K) + 



1 - 2te \.P B :x 1 ...x i wew 



where the maximization is over joint distributions satisfying 

(EoJ. ■ 

Corollary 4.2: 

C t , q < min max I(X U . . . ,X t ; Y), (29) 

WGWt f Xl ...x t 

where Xi, . . . , X t , Y are g-ary r.v.'s, PyiXi.-.Xt = ^ anc l me 
maximization is over joint distributions such that Xi, . . . ,X t 
are i.i.d. 

Proof: As we prove only a min-max type result, it be- 
comes sufficient to consider only "fixed" memoryless coalition 
strategies, i.e., strategies that remain fixed at every symbol 
instead of varying arbitrarily. In the subsequent text, W n will 
denote the n-letter extension of a DMC W. 

Consider any sequence of t-fingerprinting codes 
(F n ,§ n ),n = 1,2,... of rate R and error e n , where 
e n approaches as n increases. Then 



e avg (F, W n ) < e n for every W S W t 



(30) 



Fix some W £ Wt. We find that ( I281 l holds for every n. 
Therefore by the arguments in Theorem 14.11 

R< t-^-t l-Y / I(X ll ,...,X tl ;Y l \K) + ^ n ) , (31) 



1 — e' \ n 



1=1 




Fig. 1. The uniform channel with 2 pirates 



r.h.s. is a function of (Px u , ...x tl * \K=k* 1 W). For every I 6 
[n] , Xu , . . . , X t i are i.i.d. when conditioned on K. Therefore 
this term is at most 

max I(X u ...,X t ;Y), 

p x 1 ...x t 

where X%, . . . , X t , Y are q-ary r.v.'s with PY\x 1 ,...,x t — W, 
and the maximization is over i.i.d. r.v.'s. Finally, since (|3T| i is 
true for every W £ Wt, we obtain the stated result by taking 
n — » 00. ■ 



B. The binary case 

Consider the case where Q = {0, 1}. We would like to 
evaluate the upper bound on Cj^ given by Corollary 14.21 
Computing the exact optimum in this formula is a difficult 
problem. Instead of attempting this, we will use a particular 
channel W in d29b and compute a maximum on the prior 
distribution Px 1 ...x t for this channel. The resulting value of 
the rate gives an upper bound on capacity Ct,2- Let W be the 
"uniform channel" defined by 



W{l\x 1} ...,x t ) = j, 



W(p\x t 



,x t ) 



w 

1 ~ 7' 



where w is the number of Is among xi,...,Xt- Fig. 03 
shows the uniform channel for t = 2. Intuitively this choice 
is the worst strategy of the coalition from the distributor's 
perspective. 

If Xi, . . . ,Xt are independent binary-valued r.v.'s with 
p(X l = 1) = p,0 < p < l,i = l,...,t, and Y is the 
output of the uniform channel with inputs Xi,...,X t , we 
have P(Y = 1) = p and 



i=0 



where both e' n = (e n + 2 t q nR )t and approach as n — » 
00. Considering the inner term, we note that 

1 " 

- V I(X U , X tl ; Y t \K) < I(X U , X tl . ; YJ. \K = k*), 

where I* — l*(W) and k* — k*(W) are the coordinate and 
key which maximize the mutual information. The term on the 



Evaluating the maximum mutual information in d29l ) for this 
channel gives a closed-form upper bound: 
Theorem 4.3: 



< 



pe[o,i] 
1 



i=0 



C7 t , 2 <maxJ^)-E(j^(l-^(J)} (32) 

(33) 



A proof of the estimate ( T33l > is given in the Appendix. 
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V. A STRONG CONVERSE UPPER BOUND 
A. The general case 

Theorem 5.1: For any < e < 1, 

C tq (s)< min max max I{X t ] (34) 

WGWt Px 1 ...x t i=l,...,t 

where X\, . . . , X t , Y are g-ary r.v.'s, iV|Xi...x« = W an d me 
maximization is over joint distributions such that Xi, . . . , X t 
are independent. 

Proof: We borrow techniques from [1] in this proof. The 
result is proved for the case t = 2 to ease understanding. It 
is a straightforward extension for arbitrary t. In the proof, all 
logarithms are to the base q. 

Consider a family of n-length codes {(/&, 4>k), k G JC] for 
M users with probability distribution ir(k) over K, which is 
2-fingerprinting with e-error (0 < e < 1). Therefore 

e avg (F, $, W n ) < e for every W G W 2 . 

Let a;' fe ) = /fe(i) denote the fingerprints and D\ ' = {y : 
4>k{y) — i} denote the decoding regions for i = 1, . . . , M, k G 
/C. Then the above error criterion can be written as follows: 
For every W G W 2 , 

M 

E *(*) jJ5 E U Df\ X f\ X f) >l-e. 

Fix some W £ W 2 . There exists a fe* = fe*(W) G /C such 
that 

Af 

^ E W "(^ >Ulf \xP)>l-e. 

Hereafter, we drop the superscript fe* for simplicity. Conse- 
quently, either 

1 M 1 - 

— £ 1^(1)^,^) >_£ (35) 

t>3=l 

1 M 1 - 

or M2 E ^"(^1^,^) > (36) 

must be true. Let us assume ((35) is true. We first find a subset 
A of "good" pairs of users (messages) for W. Define 

A = {(», j) : W«(A|*i,tfi) > 1 - e,l < i,j < M}, (37) 
where e is such that 0<1 — e < (1 — e)/2. Then 

1^1 > (1 - e*)M 2 , where e* 4 I±£ (38) 

Next, we derive a subset „4 of the "good" pairs where ap- 
proximate independence holds between the fingerprints (code- 
words) corresponding to a pair of users (messages) uniformly 
distributed over this subset. This is needed to restrict the 
maximization in the final result ( f34b to joint distributions 
where the r.v.'s are independent. 

Lemma 5.2: [1] Let C = {xi,...,Xm} Q Q n , A C 
{1, . . . , M} x {1, . . . , M} with \A\ > (1 - e*)M 2 , < 
e* < 1. Then for any < 7 < e*/(l - £*), < 
A < 1, there exist l±,...,l r G [n], where r < £*/(7(l — 



e*)), and some (aci, x[), . . . , (x r , x' r ), such that for A = 

G A : x l i m = x m ,Xji m = x' m ,Vm G [r]} 

(a) |^| > X r \A\, and 

(b) For all x\ , x 2 G Q, Z G [n] , 

(1 + 7) Pr(X i; = 11) Pr{X 2l = x 2 ) - 7 - I Q| 2 A 
<Pr(^u =3:1,^21=0:2) 
<nu«{(l + 7)Pr(Xi, =z 1 )Pr(X 2 ; = z 2 ),A} , 

where (Xi, X 2 ) is a pair of r.v.'s with uniform distribu- 
tion on {(xi,Xj) : G ^4}. 

Applying Lemma 15.21 to A as in d37l i with parameters 7 = 

nT 1 / 2 , A = -nT 1 , we obtain 

\A\ > A r \A\ , for some r < n 1/2 s*/{l - £*). (39) 

For j = 1,...,M, define B(j) = {i : G A, 1 < 

i < M}. Observe that the subcode corresponding to B(j) is 
a "good" code for the single-user channel obtained by fixing 
the second input to j. Thus, the single-user strong converse 
given below holds for this subcode. 

Lemma 5.3: [5] If (/, cf>) is a code with codewords 
{xi, . . . ,xm} Q Q n and decoding regions Di,i = 1, . . . , M, 
for the (non-stationary) single-user DMC {W/}g l5 such that 
for every i = 1, . . . , M, Pr(D,|x,) > 1 — e, < e < 1, then 

n 

logM<^/(X i ;^) + 0(n 1 / 2 ), 
1=1 

where X is distributed uniformly on the set of codewords. 
Using Lemma IB31 on the subcode B(j), 

n 

log|£(j)| <^/(X 1/; Yi|X 2i =x jl ) + 0(n 1 / 2 ), (40) 
1=1 

where (Xi,X 2 ) are distributed as in Lemma 15.21 and 
Py\x 1 x 2 — W n . Furthermore, using d40b . we obtain 

lAr 1 ^ g \B(j)\ 

n 

< \A~\- 1 ^2i(Xu;Yi\X2i = xji) J2 ifai = x ) 

(i,j)<EAl=l xeQ 

+ 0{n 1 ' 2 ) 

n 

1=1 xeQ (i,j)eA 

+ 0{n 1 ' 2 ) 

n 

= J2 I (Xu;Y l \X 2l ) + 0{n^ 2 ), (41) 
1=1 

since Pr(X 2l = x) = {A^ 1 }~2(i.j)eA 1 ( x ji = x ) for 1 e W- 
We next establish a lower bound on the left-side term in order 
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to obtain an inequality for M. 



A I 



lAr 1 E tog\B(j)\ = \-A\- iy E\BU)\toE\B(j)\ (42) 

(i,j)<=A J'=l 

>\Ar J2 |B(i)|log|S(i)| 

3-W)\>^M\r- 

E 1-^0)1 (43) 

j:|B(i)|>i^MA'- 



> i^r'io, 



where (l42l follows from the definition of B(j). Now, 

E roi 

J:|B0)|>i^A/A-' 
A/ 

= £rai- E toi 

> |^| - ^^-M 2 X r 

n 

> \A\ - -\A\ 

n 

by using ( f38l> and ( 13 91 , Using this inequality in d43l >. we get 

l^l- 1 E Iog|B(j')l> (l-±W(^— ^MA r 



Combining gT}, d44j> and $39$, 



(44) 



logM< (\ + — L_) ^E^S^I^O + O^ 1 / 2 )^ 



— log(l — £*) + logn + ~™ 1//2 logn 

n 

< E Yi\X 2l ) + 0(n x l 2 logn). 

z=i 



(45) 



Although the above inequality resembles what is needed 
in the theorem, note that Xu and X 2 i are not necessarily 
independent. For / G [n], let (Xu,X 2 i,Yi) be r.v.'s with 
distribution 

Pr(Xy = a;i,X 2 i = z 2 , Y"; = y) = 

Pr(X ll =a! 1 )Pr(X2i=a:2)W(i/|a;i,a:2) 

for all Xi,x 2 ,y G Q. From Lemma |5.2f b), for n~ 1//2 > 
Q| 2 n _1 and every I G [n] 

(1 + n- 1 / 2 ) Pr(X i; = n) Pr(X 2i = x 2 ) - ^l 2 

< Pt(X 11 =x 1 ,X 2 i=x 2 ) 

< (1 + n- 1 ' 2 ) Vv{X u = xx) Vr(X 2l = x 2 ) + rT l , 

i.e., I Pr(Xi; = x 1 , X 2i = x 2 ) 

-Pi{X u = x u X 2l = x 2 )\ <2n~ 1 / 2 . 

Thus, by the uniform continuity of mutual information, for all 

I G [n], 

|/(^ii;^|X 2 ,)-/(Xi,;^|X aI )| <<*„, 



where a„ — > as n — > 00. Together with d45l > and dividing 

by n, 

R< max I(Xi;Y\X 2 ) + f3 n , (46) 

Px 1 x 2 -Px 1 Px 2 

where f3 n = a n + 0{n~ x l' 1 logn) — > as n — » 00. Similarly, 
assuming (l36l l is true, one can prove 

i?< max i"(X 2 ;y|X 1 ) + ) S;. (47) 

Px 1 x 2 -Px 1 Px 2 

Since either d46i > or (|47] > holds for every G VV 2 , taking 
n — > 00 concludes the proof. ■ 

B. r/ze binary case 

Fix Q = {0,1}. For the case of t = 2 and t = 3, we 
again pick the uniform channel and obtain upper bounds on 
the expression in Theorem 15. II which turn out to be stronger 
than the bounds resulting from ( f32b . The calculations become 
quite tedious for larger t. For t = 2, let X\, X 2 be independent 
binary-valued r.v.'s with P(X{ = 1) = pi,0 < pi < l,i = 
1,2, and let Y be the output of the uniform channel with 
inputs X\ and X 2 . We have 

H(Y\X 2 ) = (1- P2 )h (|) + p 2 h 
ff(Y|X!,X 2 ) = (1 - Pl )p2 + Pl (l- P2 ). 

Computing the maximum conditional mutual information 
gives C 2j2 < 0.322. A similar computation for t = 3 yields 



H(Y\X 2 ,X 3 ) 

= (l-p 2 )(l-p 3 )h(^j+(l-p 2 )p 3 h 
l+Pi 



l+Pi 



ff(y|x 1) x 2 ,x 3 ; 



P2P3h 



I-Pl 



= (1 -P1P2P3 - (1 -P2)(l ~P3))h I -J , 

and the maximization gives C3. 2 < 0.199. Combining these 
upper bounds with our lower bounds from Theorem 13.11 and 
Theorem 13.21 we obtain: 
Theorem 5.4: 

0.25 < C 2 , 2 < 0.322. 
0.083 < C 3 , 2 < 0.199. 

VI. Conclusion 

In this paper, we prove new lower bounds on the maximum 
rate of binary fingerprinting codes for 2 and 3 pirates by 
considering typical coalitions which improves the random 
coding results obtained previously in the literature. We also 
prove several new upper bounds on fingerprinting capacity 
relying upon converse theorems for a class of channels which 
are similar to the multiple-access channel. Our results establish 
for the binary case, Ct, 2 < (tln2)~ 1 . Combined with the 
result of [17] this implies that 0(l/t 2 ) < C t , 2 < 0(1/*). For 
the general case with arbitrary alphabets, we have established 
some upper bounds on the capacity involving single-letter 
mutual information quantities. 



10 



IEEE TRANSACTIONS ON INFORMATION THEORY 



Appendix 

A. A lemma on the size of coalitions 

Lemma A.l: Let (F, $) be a randomized code of size at 
least 2t — 1. Assume that 



e max (F, $, V) < e for every V E V*. (48) 

Then for any r < t, 

SmaxC-F, V) < 2e for every V E V r . 

Proof: For simplicity of presentation we take t = t — 1. 
The general case of 1 < r < t can be established with only 
minor changes to the proof below. For any V E Vt-i, let us 
define a V' E Vt where 

V(y|a;i, . . . , x t _ x , x t ) = V(y|ajx, . . . , aj t _i), 
Vx!,...,x t ,y E Q". 

Then, for any coalition [7 of size i — 1, and any user u £ U, 



e(U,F,*,V) 

= e k Y v(y\Mu)) 



f> K (y)iU 



£ K Y v '(y\fK(u)j K (u)) 



= E 



k[ Y V\y\f K {U'))+ Y, V'(y\f K (U')) 



v 

<t>K(y)?u' 



4>K(y)=u 



(49) 



where U' = UU{u}. The first term in the last equation satisfies 

e(U',F,$,V) <e (50) 

by the assumption of the lemma. We will next show that the 
second term in d49l is also at most e. Suppose for the sake of 
contradiction that 

Ex Y V\y\f K (U'))>e. 
v- 

<Pk(v)=u 

Let u' £ U' and U" = U U {u'} (we assume that the size of 
the code is at least t + 2, or at least 2t — 1 in the general case). 
Then 

e(U",F,$,V) 

= e k Y v '(y\fK{u")) 

y- 

4>K(y)(U" 

>E K Y V'(y\f K (U')) > e. 
y-<t>K(y)=u 

But this contradicts our initial assumption d48l . ■ 



B. Proof of Proposition \2.6\ 

It is clear that Ct, q < C" g . Therefore, it is enough to 
show that for every randomized code (F, $), there exists 
another randomized code (F* , $*) of the same rate such that 
e max (F *, $*, V) = g a vg(^, V) for every channel V. 

We are given {(fk, 4>k)i k E IC}. Let a E E identify a 
particular permutation from the set of all permutations on the 
message set M.. Choose a uniformly at random from E and 
construct a new key k = (k, a). Define 

/:(•) = /*(*(•)). 

Let (F*,$*) be the randomized code corresponding to the 
family E IC x E}. Then, for every channel V, 

e avg (i ;l *,<I>*,y) = g avg (F,$,y). Furthermore, for any U C 

X, |Z7| =t, 

e(J7,F*,$*,F) 

= li?iEE 7r ( fc ) E nylA-Kf/))) 



eeskeic y- 



which does not depend on the subset U because of the aver- 
aging over all permutations. This implies e max (F*, <&*, V) = 
e avg (F*,<F,y). 



C. Proof of Theorem I4.il 

Our goal is to estimate max pg [ ,i] u iPi where we use the 
following notation 

t 

u(p, t) = h(p) - Y a i h (t) • 



First, note that h is a concave function, and therefore u(p, t) 
is non-negative for all p E [0,1]. Bernstein proved that 
the sequence of polynomials B t (p) = 2\^i—( S oiif{i/t),t = 
1,2,..., where / is a function continuous on [0, 1], provides 
a uniform approximation to / on [0,1]. His proof, found for 
instance in Feller [13] §7.2, relies on the weak law of large 
numbers. Refining the proof in the case of the function h, we 
show that for any p E [0, 1] and any t, 

«(**)< (5D 

It suffices to consider the case p E (0,1/2]. Given some 
x = i/t, let us write a quadratic Taylor approximation for 

h(x) : 

h{x) = h{p) + {x-p)\og 2 ^+ { ^-rfa{x) (52) 

P 2 

where the coefficient a(x) depends on x, since a(x) — h"{fy) 
for some 7 £ [x,p]. We shall also consider the residual 
function 

1-p 



g (x) = h (x) - h(p) -(x-p) log 2 



P 
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Fig. 2. To the proof that a(xo) > a(0). 



The main part of our proof is to show that for any x £ [0, 1], 



2p- 2 log 2 (1 -p) < a(x) < 0. 



(53) 



The right inequality is obvious since h" (x) < for all < 
x < 1. The left inequality will be proven in two steps. 

Let us take any point xq G [0,p\. Then we compare g (x) 
with the quadratic function 

g xo {x) = a{x ) - 

on the entire interval x G [0,p]. We first prove that functions 
g X(i (x) and g (x) coincide at only two points, namely p and 
xq. Indeed, let us assume that there exists a third such point 
x\. Without loss of generality, let xq < x± < p. The functions 
g X(i (x) and g (x) coincide at the ends of both intervals [xo, xi] 
and [xi,p]; therefore there exist two points 9' G (xo,xi) and 
9" G (xi,p) where both functions have equal derivatives: 



a(x Q )(0' -p) = log 2 
a(x )(9" -p)=log 2 



log; 



log; 



1-P . 

p 

l-p 

p 



The left sides of both equalities represent a linear function of 
9 given by a(xo){9 — p) whereas the right sides represent a 
convex function log 2 ^-jp- — log 2 =^2. A linear function can 
intersect a convex function at no more than two points. This 
leads to a contradiction, which shows that xo = x\ and that 
the functions g XQ (x) and g{x) intersect at two points p and 
xo- 

Our next step is to find the minimum a(x) < for all 
x G [0,p\. Compare the function g Xo {x) with go(x) for any 
parameter xo G {0,p\. Now we use the fact that both functions 
intersect g (x) at only two points, one of which is x = p. 
However, go(x) has its second intersection x = to the left 
of xq. Thus, go(x) < g Xo (x) for < x < p and therefore, 
a(xo) > a(0) (see Fig. O. Now we conclude that 



o(0) 



min a(x). 

ase[Q,p] 



Finally, we find a(0) using the equality <?o(0) = 3(0), 
which gives a(0) = 2p~ 2 log 2 (l —p). 

The second interval x G [p, 1] can be considered similarly. 
Again, we use the same arguments and conclude that the end 



point x— 1 gives the minimum a(l) = min^g^ X ] a(x). Direct 
calculation also shows that the global minimum is achieved at 
as <z(0) < o(l) for all p < 1/2, and o(l) = a(0) for 
p = 1/2. This gives us the left inequality in ( f53l and shows 
that for any p < 1/2 and any x G [0, 1], 

h (x) > h(p) + (x-p) log 2 1 —P + (x- pf l ° g ^\- p) . 

p p z 

Let us take x = i/t, i = 0,1, ... ,t and substitute the above 
estimate into the expression for u(p,t). In this substitution, 
we also use the first two moments of the binomial distribution 
{cti} , which gives 

t 



si = EMi-p)=G 



i=0 



.2 _ p(l ~p) 



P = 



i=0 



Then 



u{p,t) < -log 2 Si 



1-Po I0g2(!-P) 



< - 



p p* 
(l-p) ln(l-p) 
pi In 2 



(l-p) ln(l-p) 



Finally, it is easy to verify that the function — 
monotonically decreases on the interval [0, |] and achieves its 
maximum 1 at p = 0. This establishes ( BTT l and hence the 
bound d33l 
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